Methods and apparatuses for vehicle appearance feature recognition, methods and apparatuses for vehicle retrieval, storage medium, and electronic devices

ABSTRACT

The method for vehicle appearance feature recognition includes: multiple region segmentation results of a target vehicle are obtained from an image to be recognized; global feature data and multiple pieces of region feature data are extracted from the image to be recognized based on the multiple region segmentation results; and the global feature data and the multiple pieces of region feature data are fused to obtain appearance feature data of the target vehicle.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure is a U.S. continuation application of U.S.application Ser. No. 16/678,870, filed on Nov. 8, 2019, which is a U.S.continuation application of International Application No.PCT/CN2018/093165, filed on Jun. 27, 2018, which claims benefit ofChinese Patent Application No. 201710507778.5, filed to the ChinesePatent Office on Jun. 28, 2017. The disclosures of U.S. application Ser.No. 16/678,870, International Application No. PCT/CN2018/093165, andChinese Patent Application No. 201710507778.5 are incorporated herein byreference in their entireties.

BACKGROUND

A vehicle retrieval task refers to providing a vehicle image to bequeried, and retrieving all images of vehicles in the vehicle image in alarge-scale vehicle image database.

SUMMARY

Embodiments of the present disclosure relate to the artificialintelligence technologies, and in particular to methods and apparatusesfor vehicle appearance feature recognition, storage medium, andelectronic devices, as well as methods and apparatuses for vehicleretrieval, storage medium, and electronic devices.

An objective of embodiments of the present disclosure is to providetechnical solutions for vehicle appearance feature recognition andtechnical solutions for vehicle retrieval.

According to a first aspect of the embodiments of the presentdisclosure, a method for vehicle appearance feature recognition isprovided, including: obtaining multiple region segmentation results of atarget vehicle from an image to be recognized; extracting global featuredata and multiple pieces of region feature data from the image to berecognized based on the multiple region segmentation results; and fusingthe global feature data and the multiple pieces of region feature datato obtain appearance feature data of the target vehicle.

According to a second aspect of the embodiments of the presentdisclosure, a method for vehicle retrieval is provided. The methodincludes: obtaining appearance feature data of a target vehicle in animage to be retrieved by means of the method according to the firstaspect of the embodiments of the present disclosure; and searching acandidate vehicle image library for a target candidate vehicle imagematching the appearance feature data.

According to a third aspect of the embodiments of the presentdisclosure, an apparatus for vehicle appearance feature recognition isprovided. The apparatus includes: a first obtaining module, configuredto obtain multiple region segmentation results of a target vehicle froman image to be recognized; an extraction module, configured to extractglobal feature data and multiple pieces of region feature data from theimage to be recognized based on the multiple region segmentationresults; and a fusion module, configured to fuse the global feature dataand the multiple pieces of region feature data to obtain appearancefeature data of the target vehicle.

According to a fourth aspect of the embodiments of the presentdisclosure, an apparatus for vehicle retrieval is provided. Theapparatus includes: a second obtaining module, configured to obtainappearance feature data of a target vehicle in an image to be retrievedby means of the apparatus according to the third aspect of theembodiments of the present disclosure; and a searching module,configured to search a candidate vehicle image library for a targetcandidate vehicle image matching the appearance feature data.

According to a fifth aspect of the embodiments of the presentdisclosure, provided is a computer readable storage medium having storedthereon computer program instructions that, when executed by aprocessor, cause steps of the method for vehicle appearance featurerecognition according to the first aspect of the embodiments of thepresent disclosure to be implemented.

According to a sixth aspect of the embodiments of the presentdisclosure, provided is a computer readable storage medium having storedthereon computer program instructions that, when executed by aprocessor, cause steps of the method for vehicle retrieval according tothe second aspect of the embodiments of the present disclosure to beimplemented.

According to a seventh aspect of the embodiments of the presentdisclosure, provided is an electronic device, including a firstprocessor, a first memory, a first communication element, and a firstcommunication bus, wherein the first processor, the first memory, andthe first communication element are in communication with each other bymeans of the first communication bus; and the first memory is configuredto store at least one executable instruction which enables the firstprocessor to execute the steps of the method for vehicle appearancefeature recognition according to the first aspect of the embodiments ofthe present disclosure.

According to an eighth aspect of the embodiments of the presentdisclosure, provided is an electronic device, including a secondprocessor, a second memory, a second communication element, and a secondcommunication bus, wherein the second processor, the second memory, andthe second communication element are in communication with each other bymeans of the second communication bus; and the second memory isconfigured to store at least one executable instruction which enablesthe second processor to execute the steps of the method for vehicleretrieval according to the second aspect of the embodiments of thepresent disclosure.

The following further describes in detail the technical solutions of thepresent disclosure with reference to the accompanying drawings andembodiments.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings constituting a part of the specificationdescribe the embodiments of the present disclosure and are intended toexplain the principles of the present disclosure together with thedescriptions.

According to the following detailed descriptions, the present disclosurecan be understood more clearly with reference to the accompanyingdrawings.

FIG. 1 is a flowchart of one embodiment of a method for vehicleappearance feature recognition according to the present disclosure.

FIG. 2 is a flowchart of another embodiment of the method for vehicleappearance feature recognition according to the present disclosure.

FIG. 3 is a schematic diagram showing distribution of vehicle key pointsfor implementing the method embodiment of FIG. 2.

FIG. 4 is a schematic diagram of a network framework for implementingthe method embodiment of FIG. 2.

FIG. 5 is a schematic diagram showing a vehicle region segmentationresult for implementing the method embodiment of FIG. 2.

FIG. 6 is a schematic diagram showing a weight value of a vehicle regionfor implementing the method embodiment of FIG. 2.

FIG. 7 is a flowchart of one embodiment of a method for vehicleretrieval according to the present disclosure.

FIG. 8 is a flowchart of another embodiment of the method for vehicleretrieval according to the present disclosure.

FIG. 9 is a schematic diagram showing a similarity distance of a vehiclefor implementing the method embodiment of FIG. 8.

FIG. 10 is a schematic structural diagram of one embodiment of anapparatus for vehicle appearance feature recognition according to thepresent disclosure.

FIG. 11 is a schematic structural diagram of another embodiment of theapparatus for vehicle appearance feature recognition according to thepresent disclosure.

FIG. 12 is a schematic structural diagram of one embodiment of anapparatus for vehicle retrieval according to the present disclosure.

FIG. 13 is a schematic structural diagram of another embodiment of theapparatus for vehicle retrieval according to the present disclosure.

FIG. 14 is a schematic structural diagram of one embodiment of anelectronic device applicable to realize a terminal device or a serveraccording to the embodiments of the present disclosure.

FIG. 15 is a schematic structural diagram of another embodiment of theelectronic device applicable to realize the terminal device or theserver according to the embodiments of the present disclosure.

DETAILED DESCRIPTION

Various exemplary embodiments of the present disclosure are nowdescribed in detail with reference to the accompanying drawings. Itshould be noted that, unless otherwise stated specifically, relativearrangement of the components and steps, the numerical expressions, andthe values set forth in the embodiments are not intended to limit thescope of the present disclosure.

In addition, it should be understood that, for ease of description, thesize of each part shown in the accompanying drawings is not drawn inactual proportion.

The following descriptions of at least one exemplary embodiment aremerely illustrative actually, and are not intended to limit the presentdisclosure and the disclosures or uses thereof.

Technologies, methods and devices known to a person of ordinary skill inthe related art may not be discussed in detail, but such technologies,methods and devices should be considered as a part of the specificationin appropriate situations.

It should be noted that similar reference numerals and letters in thefollowing accompanying drawings represent similar items. Therefore, oncean item is defined in an accompanying drawing, the item does not need tobe further discussed in the subsequent accompanying drawings.

The embodiments of the present disclosure may be applied to electronicdevices such as terminal devices, computer systems, and servers, whichmay operate with numerous other general-purpose or special-purposecomputing system environments or configurations. Examples of well-knownterminal devices, computing systems, environments, and/or configurationssuitable for use together with the electronic devices such as terminaldevices, computer systems, and servers include, but are not limited to,personal computer systems, server computer systems, thin clients, thickclients, handheld or laptop devices, microprocessor-based systems, settop boxes, programmable consumer electronics, network personalcomputers, small computer systems, large computer systems, distributedcloud computing environments that include any one of the systems, andthe like.

The electronic devices such as terminal devices, computer systems, andservers may be described in the general context of computer systemexecutable instructions (such as, program modules) executed by thecomputer systems. Generally, the program modules may include routines,programs, target programs, components, logics, data structures, and thelike, to perform specific tasks or implement specific abstract datacategories. The computer systems/servers may be practiced in thedistributed cloud computing environments in which tasks are performed byremote processing devices that are linked through a communicationsnetwork. In the distributed computing environments, the program modulesmay be located in local or remote computing system storage mediaincluding storage devices.

FIG. 1 is a flowchart of one embodiment of a method for vehicleappearance feature recognition according to the present disclosure.

Referring to FIG. 1, at step S101, multiple region segmentation resultsof a target vehicle are obtained from an image to be recognized.

In some embodiments, in terms of the contents included in the image, theimage to be recognized may be an image including a part of the targetvehicle or an image including the whole target vehicle, etc. In terms ofthe category of the image, the image to be recognized may be aphotographed static image, or a video image in a video frame sequence,and may also be a synthetic image, etc. Multiple region segmentationresults respectively correspond to regions of different orientations ofthe target vehicle. According to one or more embodiments of the presentdisclosure, the multiple region segmentation results may include, butnot limited to, segmentation results of a front side, a rear side, aleft side, and a right side of the target vehicle. Certainly, in someembodiments of the present disclosure, the multiple region segmentationresults are not limited to the segmentation results of four regionsincluding the front side, the rear side, the left side, and the rightside of the target vehicle. For example, the multiple regionsegmentation results may further include segmentation results of sixregions including the front side, the rear side, the left side, theright side, the top, and the bottom of the target vehicle, and themultiple region segmentation results may further include thesegmentation results of eight regions including the front side, the rearside, the left side, the right side, the left front, the right front,the left rear, and the right rear. The region segmentation result is asingle-channel weight map, and the size of the value in the regionsegmentation result indicates the importance degree of the correspondingposition in the image to be recognized, that is, the larger the value inthe region segmentation result is, the higher the degree of importanceof the corresponding position in the image to be recognized is. Thesmaller the value in the region segmentation result is, the lower thedegree of importance of the corresponding position in the image to berecognized is.

In one embodiment, step S101 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by a first obtaining module 501 run by the processor.

At step S102, global feature data and multiple pieces of region featuredata are extracted from the image to be recognized based on the multipleregion segmentation results.

The global feature data and the multiple pieces of region feature dataare global feature data and multiple pieces of region feature data ofthe target vehicle. The global feature data is a vector-representedglobal feature, and the region feature data is a vector-representedregion feature.

In one embodiment, step S102 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by an extraction module 502 run by the processor.

At step S103, the global feature data and the multiple pieces of regionfeature data are fused to obtain appearance feature data of the targetvehicle.

In the case where both the global feature data and the region featuredata are represented by vectors, the dimension of the global featurevector is the same as the dimension of the region feature vector. Theappearance feature data of the target vehicle includes features ofmultiple local regions of the target vehicle and features of a globalregion of the target vehicle.

In one embodiment step S103 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by a fusion module 503 run by the processor.

According to the method for vehicle appearance feature recognition inthe embodiment, multiple region segmentation results of a target vehicleare obtained from an image to be recognized, then global feature dataand multiple pieces of region feature data are extracted from the imageto be recognized based on the multiple region segmentation results, andthe global feature data and the multiple pieces of region feature dataare fused to obtain appearance feature data of the target vehicle.Compared with the method for obtaining vehicle appearance features inthe prior art, the method for vehicle appearance feature recognition inthe embodiments recognizes the vehicle appearance features such as theglobal features, and features of local regions of the vehicleappearance, and reflects detail information of the target vehicle bymeans of the local region features, so as to describe the appearance ofthe vehicle more accurately. In addition, by means of the vehicleappearance features recognized in the embodiments, the vehicleappearance features in different vehicle images may be compareddirectly, thereby solving the problem that different regions betweendifferent vehicle images cannot be compared.

The method for vehicle appearance feature recognition in the embodimentsmay be executed by any appropriate device having data processingcapability, including, but not limited to, a terminal device, a serverand the like.

FIG. 2 is a flowchart of another embodiment of the method for vehicleappearance feature recognition according to the present disclosure.

Referring to FIG. 2, at step S201, multiple region segmentation resultsof the target vehicle are obtained from the image to be recognized bymeans of a first neural network for region extraction.

In one embodiment, step S201 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by an obtaining sub-module 6011 run by the processor.

The first neural network may be any appropriate neural network that mayimplement region extraction or target object recognition, and mayinclude, but not limited to, a convolutional neural network, an enhancedlearning neural network, a generation network in an adversarial neuralnetwork, etc. The configuration of the structure in the neural networkmay be appropriately set by a person skilled in the art according toactual needs, such as the number of convolution layers, the size of theconvolution kernel, the number of channels, etc., which is not limitedin the embodiments of the present disclosure. In some embodiments of thepresent disclosure, the first neural network has a first featureextraction layer and a first computing layer connected to a tail end ofthe first feature extraction layer.

According to one or more embodiments of the present disclosure, stepS201 includes: performing feature extraction on the image to berecognized by means of the first feature extraction layer to obtainmultiple key points of the target vehicle; and classifying the multiplekey points by means of the first computing layer to obtain multiple keypoint clusters, and respectively fusing feature maps of key points inthe multiple key point clusters, to obtain region segmentation resultscorresponding to the multiple key point clusters.

Since the vehicle is solid color and the chromatograms of some vehiclesare quite similar, it is difficult to distinguish the vehicles accordingto the colors. In some embodiments extract the region features of thevehicle based on the vehicle key points. In this way, many detailedfeatures of the vehicle may be better reflected from the regionfeatures. The vehicle key point in the embodiments is not the boundarypoint or corner point of the vehicle, but a significantly differentposition on the vehicle or a main component of the vehicle, such as awheel, a lamp, a logo, a rearview mirror, a license plate, etc. FIG. 3is a schematic diagram showing distribution of vehicle key points forimplementing the method embodiment of FIG. 2. As shown in FIG. 3, thevehicle key points in the embodiments include a left front wheel 1, aleft rear wheel 2, a right front wheel 3, a right rear wheel 4, a rightfog-proof lamp 5, a left fog-proof lamp 6, a right front headlight 7, aleft front headlight 8, a front car logo 9, a front license plate 10, aleft rearview mirror 11, a right rearview mirror 12, a right frontcorner 13 of the roof, a left front corner 14 of the roof, a left rearcorner of the roof 15, a right rear corner 16 of the roof, a lefttaillight 17, a right taillight 18, a rear car logo 19 and a rearlicense plate 20. In view of the above, the detailed features of thevehicle may be reflected from the region features, so as to describe theappearance of the vehicle more accurately.

In one or more embodiments, the first feature extraction layer performsfeature extraction on the vehicle key points of the 20 vehicle keypoints in the input vehicle image to obtain a response feature map ofthe multiple vehicle key points. The first feature extraction layer maybe an hourglass network structure. The first feature extraction layerneeds to be trained before this step is executed. The training processof the first feature extraction layer may be: designing a targetresponse feature map of the annotated vehicle key point as a Gaussiankernel around the annotated key point position, and then inputting avehicle image containing the annotated vehicle key point into the firstfeature extraction layer; determining whether a prediction result of thefirst feature extraction layer is close to a target Gaussian kernel; andif the prediction result of the first feature extraction layer is notclose to the target Gaussian kernel, adjusting parameters of the firstfeature extraction layer according to a difference between theprediction result and the target Gaussian kernel, and performingrepeated iterative training. The prediction result of the first featureextraction layer for the annotated vehicle key point is a Gaussiankernel corresponding to the response feature map of the annotatedvehicle key point, and the difference between the prediction result andthe target Gaussian kernel may be a cross entropy. FIG. 4 is a schematicdiagram of a network framework for implementing the method embodiment ofFIG. 2. As shown in FIG. 4, a marker regression machine in the firstneural network (a) is the representation of the first feature extractionlayer.

In some embodiments, there are always some invisible regions of thevehicle in a vehicle image photographed at a particular angle. In orderto deal with the problem of invisible vehicle key points, the geometricrelationship between the vehicle key points may be fully utilized toallocate 20 vehicle key points into four clusters, for example, C1=[5,6, 7, 8, 9, 10, 13, 14], C2=[15, 16, 17, 18, 19, 20], C3=[1, 2, 6, 8,11, 14, 15, 17], and C4=[3, 4, 5, 7, 12, 13, 16, 18], the vehicle keypoints in the four clusters correspond to the front, rear, left andright sides of the vehicle, respectively, and then the feature maps ofthe key points in the multiple clusters are fused to obtain a frontsegmentation result, a rear segmentation result, a left segmentationresult, and a right segmentation result of the vehicle, as shown in part(a) of FIG. 4.

FIG. 5 is a schematic diagram showing a vehicle region segmentationresult for implementing the method embodiment of FIG. 2. As shown inFIG. 5, three vehicle images are sequentially arranged in the left side,and the front segmentation result, the rear segmentation result, theleft segmentation result, and the right segmentation result of eachvehicle image are sequentially arranged in the right side. As shown inthe drawing, the segmentation result of the visible region of thevehicle in the vehicle image generally has a higher response than thesegmentation result of the invisible region of the vehicle, which mayindicate that the first feature extraction layer may not only predictthe vehicle key points, but also may distinguish the visible vehicle keypoints from the invisible vehicle key points.

At step S202, global feature data and multiple pieces of region featuredata of the target vehicle are extracted from the image to be recognizedby means of a second neural network for feature extraction based on themultiple region segmentation results.

In one embodiment, step S202 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by an extraction sub-module 6021 run by the processor.

The second neural network may be any appropriate neural network that mayimplement region extraction or target object recognition, and mayinclude, but not limited to, a convolutional neural network, an enhancedlearning neural network, a generation network in an adversarial neuralnetwork, etc. The configuration of the optional structure in the neuralnetwork may be appropriately set by a person skilled in the artaccording to actual needs, such as the number of convolution layers, thesize of the convolution kernel, the number of channels, etc., which isnot limited in the embodiments of the present disclosure. In someembodiments, the second neural network has a first processing subnet andmultiple second processing subnets separately connected to an output endof the first processing subnet, wherein the first processing subnet hasa second feature extraction layer, a first inception module, and a firstpooling layer, and the second processing subnet has a second computinglayer, a second inception module, and a second pooling layer which areconnected to the output end of the first processing subnet. The secondfeature extraction layer includes three convolution layers and twoinception modules, and the inception module may perform convolutionoperations and pool operations.

According to one or more embodiments of the present disclosure, stepS202 includes: performing a convolution operation and a poolingoperation on the image to be recognized by means of the second featureextraction layer to obtain a global feature map of the target vehicle;performing a convolution operation and a pooling operation on the globalfeature map by means of the first inception module to obtain a firstfeature map set of the target vehicle; and performing a poolingoperation on feature maps in the first feature map set by means of thefirst pooling layer to obtain a global feature vector of the targetvehicle.

In some embodiments, the image to be recognized is first scaled, so thatthe size of the image to be recognized is 192*192, and then the scaledimage is input to a second feature extraction layer composed of threeconvolution layers and two inception modules, and the second featureextraction layer performs a convolution operation and a poolingoperation on the scaled image to obtain a global feature map having thespace size of 12*12. Then, the first inception module performs aconvolution operation and a pooling operation on the global feature mapto obtain a set of feature maps having the space size of 6*6. Finally,the first pooling layer performs a global average pooling operation onthe feature maps in the set, to obtain 1536-dimensional global featurevectors.

According to one or more embodiments of the present disclosure, stepS202 may further include: performing point multiplication on themultiple region segmentation results and the global feature mapseparately by means of the second computing layer, to obtain localfeature maps respectively corresponding to the multiple regionsegmentation results; performing a convolution operation and a poolingoperation on the local feature maps of the multiple region segmentationresults by means of the second inception module to obtain a secondfeature map set corresponding to the multiple region segmentationresults; and performing a pooling operation on the second feature mapset of the multiple region segmentation results by means of the secondpooling layer to obtain first region feature vectors corresponding tothe multiple region segmentation results.

According to one or more embodiments of the present disclosure, beforethe performing point multiplication on the multiple region segmentationresults and the global feature map separately by means of the secondcomputing layer, the method further includes: respectively scaling themultiple region segmentation results to the same size as a size of theglobal feature map by means of the second computing layer. In view ofthe above, it can be ensured that the dimension of the finally obtainedregion feature vector is the same as that of the global feature vector.

In some embodiments, the front segmentation result, the rearsegmentation result, the left segmentation result, and the rightsegmentation result of the vehicle are first scaled to the same size asa size of the global feature map, i.e., the size of 12*12. Then, pointmultiplication is performed on the scaled front segmentation result, therear segmentation result, the left segmentation result, and the rightsegmentation result and the global feature map respectively to obtain afront feature map, a rear feature map, a left feature map, and a rightfeature map of the vehicle. Then, the second inception module performs aconvolution operation and a pooling operation on the front feature map,the rear feature map, the left feature map, and the right feature map ofthe vehicle, respectively, to obtain a feature map set separatelycorresponding to the local feature maps, and the space size of thefeature maps in the feature map set is 6*6. Finally, the global maximumpooling operation is performed on the feature maps in the feature mapset corresponding to multiple local feature maps by means of the secondpooling layer, to obtain a front feature vector, a rear feature vector,a left feature vector, and a right feature vector of the vehicle.Moreover, the dimension of the feature vector of the local region is1536-dimension. The global maximum pooling operation is performed on thefeature maps in the feature map set corresponding to the multiple localfeature maps respectively, because the maximum response is more suitablefor extracting features from a local region.

As shown in part (b) of FIG. 4, the second neural network is dividedinto two phases, and the global features and the local features areextracted in the form of trunk and branch. The first phase performs aconvolution operation and a pooling operation on the image to berecognized to obtain a global feature map of the image to be recognized.The second phase consists of five branches, one global branch and fourlocal region branches. The global branch performs the similar processingin the foregoing embodiments on the global feature map to obtain theglobal feature vector, and the local region branches respectivelyperform the similar processing in the foregoing embodiments on thespecified region segmentation result in conjunction with the globalfeature map to obtain corresponding local feature vectors.

At step S203, the global feature data and the multiple pieces of regionfeature data of the target vehicle are fused by means of a third neuralnetwork for feature fusion.

In one embodiment step S203 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by a fusion sub-module 6031 run by the processor.

The third neural network may be any appropriate neural network that mayimplement feature fusion, and may include, but not limited to, aconvolutional neural network, an enhanced learning neural network, ageneration network in an adversarial neural network, etc. Theconfiguration of the optional structure in the neural network may beappropriately set by a person skilled in the art according to actualneeds, such as the number of convolution layers, the size of theconvolution kernel, the number of channels, etc., which is not limitedin the embodiments of the present disclosure. In some embodiments, thethird neural network has a first fully connected layer, a thirdcomputing layer, and a second fully connected layer which are connectedto an output end of the second neural network.

According to one or more embodiments of the present disclosure, stepS203 includes: obtaining weight values of multiple first region featurevectors by means of the first fully connected layer; respectivelyweighting the multiple first region feature vectors by means of thethird computing layer according to the weight values to obtaincorresponding multiple second region feature vectors; and performing amapping operation on the multiple second region feature vectors and theglobal feature vector by means of the second fully connected layer toobtain an appearance feature vector of the target vehicle.

According to one or more embodiments of the present disclosure, theobtaining weight values of multiple first region feature vectors bymeans of the first fully connected layer includes: performing astitching operation on the multiple first region feature vectors toobtain a stitched first region feature vector; performing a mappingoperation on the stitched first region feature vector by means of thefirst fully connected layer to obtain a set of scalars corresponding tothe multiple first region feature vectors; and performing anormalization operation on the scalars in the set to obtain the weightvalues of the multiple first region feature vectors.

In some embodiments, the following operations are included.

A stitching operation is performed on the front feature vector, the rearfeature vector, the left feature vector and the right feature vector ofthe vehicle, then the stitched front feature vector, rear featurevector, left feature vector and right feature vector are input into thefirst fully connected layer, and the first fully connected layerperforms a mapping operation on the four feature vectors to obtain ascalar set.

A normalization operation is performed on scalars in the scalar set bymeans of the Softmax function to respectively obtain weight values ofthe front feature vector, the rear feature vector, the left featurevector, and the right feature vector.

The front feature vector, the rear feature vector, the left featurevector, and the right feature vector are respectively weighted accordingto the corresponding weight values, to obtain the weighted front featurevector, rear feature vector, left feature vector and right featurevector.

A stitching operation is performed on the weighted front feature vector,rear feature vector, left feature vector and right feature vector andthe global feature vector.

The second fully connected layer performs a mapping operation on thestitched weighted local feature vector and the global feature vector toobtain a 256-dimensional vehicle appearance feature vector, as shown inpart (c) of FIG. 4.

In the process of feature fusion, the third neural network learns theweight values of the feature vectors of different vehicle regions. Thefeatures of different vehicle regions may have different importance. Thefeatures of the vehicle visible region in the vehicle image may beretained or given a greater weight. The features of the vehicleinvisible region in the vehicle image may be eliminated or given a smallweight in the competition process. For example, the orientation of thevehicle in the vehicle image is the left front, and the left and frontsides of the vehicle may be seen. The features of the two sides arerelatively important, and the weight values of the corresponding featurevectors are relatively larger, and the rear and right sides of thevehicle are invisible. Although the features of the two sides are alsoextracted, the weight values of the feature vectors of the two sides arerelatively small. In this way, the vehicle key points in the vehiclevisible region of the vehicle image contribute more to the final vehicleappearance feature vector, and the influence of the vehicle key pointsof the vehicle invisible region in the vehicle image on the finalvehicle appearance feature vector is weakened by a relatively smallweight value. In view of the above, the appearance of the vehicle may bedescribed more accurately.

FIG. 6 is a schematic diagram showing a weight value of a vehicle regionfor implementing the method embodiment of FIG. 2. As shown in FIG. 6,part (a) represents three images of different photographing angles ofone vehicle input, and weight values of the front side, the rear side,the left side, and the right side of the vehicle in the image of eachphotographing angle, part (b) represents the projection result of thevehicle appearance feature of the selected vehicle image in the test setin the two-dimensional space, and part (c) represents three image of thedifferent photographing angles of another vehicle input, and weightvalues of the front side, the rear side, the left side, and the rightside of the vehicle in the image of each photographing angle. As can beseen from the drawing, the appearance features of the same vehicle maybe aggregated regardless of the photographing angle of the vehicleimage. Therefore, the appearance features of the vehicle recognized inthe embodiments are independent of the photographing angle of the imageto be recognized of the vehicle, and the vehicle appearance features indifferent vehicle images may be directly compared, and thus the problemthat different regions between different vehicle images cannot becompared is solved. In addition, parts (a) and (c) of the drawing showthe input vehicle image and the learning weights of the correspondingtwo clusters, and the local region features of the vehicle appearanceare fused based on these learning weights. It can be observed that theweight value of the surface visible to the vehicle in the vehicle imageis higher than the weight value of the surface invisible to the vehicle.

In addition, an alternative training strategy may be adopted to trainthe second neural network and the third neural network. The trainingstrategy includes four steps. At step (i), a trunk network of the firstphase of the second neural network and the global branch of the secondphase may be trained from random initialization, and are supervised bymeans of global features of the whole image region. At step (ii), afterthe training of the trunk network of the first phase is completed,parameters of the global branch initialization of the second phase maybe used to train the four local branches of the second phase, becausethe global branch of the second phase has the same structure as thelocal branches. In addition, the training of the four local branches isseparately supervised by means of a given classification tag. At step(iii), after the training of the trunk network of the first phase andthe branch of the second phase is completed, the third neural network istrained. At step (iv), a neural network having parameters learned bymeans of the foregoing steps is initialized, and the parameters arecombined for fine tuning. Existing vehicle databases and Softmaxclassification loss may be used during training the second neuralnetwork and the third neural network.

In an optional disclosure, the vehicle appearance features recognized inthe embodiments may be used to describe the vehicle, and may also beused to analyze the vehicle attributes, such as a coarse model, asubdivision model, and a vehicle color. In addition, the classification,recognition and retrieval of the vehicle are performed by using thevehicle appearance features recognized in the embodiments.

According to the method for vehicle appearance feature recognition insome embodiments, multiple region segmentation results of the targetvehicle are obtained from the image to be recognized by means of a firstneural network for region extraction, then global feature data andmultiple pieces of region feature data of the target vehicle areextracted from the image to be recognized by means of a second neuralnetwork for feature extraction based on the multiple region segmentationresults, and the global feature data and the multiple pieces of regionfeature data of the target vehicle are fused by means of a third neuralnetwork for feature fusion to obtain the appearance feature data of thetarget vehicle. Compared with the method for obtaining vehicleappearance features in the prior art, the method for vehicle appearancefeature recognition in the embodiments recognizes the vehicle appearancefeatures such as the global features, and features of local regions ofthe vehicle appearance, and reflects detail information of the targetvehicle by means of the local region features, so as to describe theappearance of the vehicle more accurately. In addition, by means of thevehicle appearance features recognized in the embodiments, the vehicleappearance features in different vehicle images may be compareddirectly, thereby solving the problem that different regions betweendifferent vehicle images cannot be compared.

The method for vehicle appearance feature recognition in the embodimentsmay be executed by any appropriate device having data processingcapability, including, but not limited to, a terminal device, a serverand the like.

FIG. 7 is a flowchart of one embodiment of a method for vehicleretrieval according to the present disclosure.

Referring to FIG. 7, at step S301, the appearance feature data of thetarget vehicle in the image to be retrieved is obtained by means of themethod for vehicle appearance feature recognition.

In some embodiments, the appearance feature data of the target vehiclein the image to be retrieved may be obtained by the method for vehicleappearance feature recognition provided in Embodiment 1 or Embodiment 2.The appearance feature data may be data represented by a vector. Interms of the contents included in the image, the image to be recognizedmay be an image including a part of the target vehicle or an imageincluding the whole target vehicle, etc. In terms of the category of theimage, the image to be retrieved may be a photographed static image, ora video image in a video frame sequence, and may also be a syntheticimage, etc.

In one embodiment, step S301 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by a second obtaining module 701 run by the processor.

At step S302, a candidate vehicle image library is searched for a targetcandidate vehicle image matching the appearance feature data.

In one embodiment, step S302 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by a searching module 702 run by the processor.

In some embodiments, the appearance feature data of vehicles in multiplevehicle images to be selected in the candidate vehicle image library maybe obtained by means of the method for vehicle appearance featurerecognition provided in Embodiment 1 or Embodiment 2, and the appearancefeature data of the target vehicle is respectively compared with theappearance feature data of the vehicle in the vehicle image to beselected, to obtain a target candidate vehicle image matching theappearance feature data of the target vehicle.

Exemplary embodiments of the present disclosure are directed to providea method for vehicle retrieval. Obtaining appearance feature data of atarget vehicle in an image to be retrieved by means of the method forvehicle appearance feature recognition provided in Embodiment 1 orEmbodiment 2, and searching the candidate vehicle image library for atarget candidate vehicle image matching the appearance feature data mayimprove the accuracy of the vehicle retrieval.

The method for vehicle retrieval in the embodiments may be executed byany appropriate device having data processing capability, including, butnot limited to, a terminal device, a server and the like.

FIG. 8 is a flowchart of another embodiment of the method for vehicleretrieval according to the present disclosure.

Referring to FIG. 8, at step S401, the appearance feature data of thetarget vehicle in the image to be retrieved is obtained by means of themethod for vehicle appearance feature recognition.

In one optional example, step S401 may be performed by a processor byinvoking a corresponding instruction stored in a memory, and may also beperformed by a second obtaining module 804 run by the processor.

Since step S401 is the same as step S301, details are not describedherein again.

At step S402, cosine distances between the appearance feature vector ofthe target vehicle and appearance feature vectors of vehicles in vehicleimages to be selected in the candidate vehicle image library separatelyare determined.

In one embodiment, step S402 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by a searching module 805 run by the processor.

In some embodiments, a person skilled in the art may separately computethe cosine distances of the appearance feature vector of the targetvehicle and the appearance feature vector of the vehicle in the vehicleimage to be selected according to the existing cosine distancecomputational formula.

At step S403, a target candidate vehicle image matching the targetvehicle is determined according to the cosine distances.

In one embodiment, step S403 may be performed by a processor by invokinga corresponding instruction stored in a memory, and may also beperformed by a searching module 805 run by the processor.

In some embodiments, when the cosine distance between the appearancefeature vector of the target vehicle and the appearance feature vectorof the vehicle in the vehicle image to be selected is greater than orequal to a first preset threshold, the vehicle image to be selected isdetermined to be a target candidate vehicle image matching the targetvehicle. A person skilled in the art may obtain the first presetthreshold by means of tests. Certainly, the embodiments of the presentdisclosure are not limited thereto.

According to one or more embodiments of the present disclosure, themethod further includes: obtaining the photographed time and/or aphotographing position of the image to be retrieved and the photographedtime and/or photographing positions of the multiple vehicle images to beselected; determining temporal-spatial distances between the targetvehicle and vehicles in the multiple vehicle images to be selectedaccording to the photographed time and/or the photographing positions;and determining, according to the cosine distances and thetemporal-spatial distances, a target candidate vehicle image matchingthe target vehicle, in the candidate vehicle image library. Therefore,the accuracy of vehicle retrieval may be further improved.

According to one or more embodiments of the present disclosure, thedetermining, according to the cosine distances and the temporal-spatialdistances, a target candidate vehicle image matching the target vehicle,in the candidate vehicle image library includes: obtaining the multiplevehicle images to be selected from the candidate vehicle image libraryaccording to the cosine distances; determining a temporal-spatialmatching probability of the vehicle image to be selected and the targetvehicle based on the photographed time and the photographing position ofthe vehicle image to be selected, respectively; and determining,according to the cosine distances and the temporal-spatial matchingprobability, a target candidate vehicle image matching the targetvehicle.

The temporal-spatial information of the vehicle image may greatlyenhance the recall rate of vehicle retrieval. If the photographing timeand the photographing position of the to-be-retrieved vehicle image areknown, the probability of occurrence of the vehicle in the vehicle imageat another time and at another position may be obtained by statisticalmodeling. This is effective for retrieval tasks. The temporal-spatialmatching probability is determined by the photographing time and thephotographing positions of the vehicle image to be selected and thetarget vehicle image. In short, the temporal-spatial matchingprobability refers to a probability of occurrence of the target vehiclein the photographing time and the photographing position, which isobtained by statistical modeling according to the photographing time andthe photographing position of the vehicle image. According to one ormore embodiments of the present disclosure, the temporal-spatialmatching probability refers to a conditional probability of a vehicletransfer interval between two cameras, which may be calculated by thefollowing formula 1.

In practical disclosure scenarios, vehicle appearance features may notbe sufficient to distinguish a vehicle from other vehicles, particularlyif the vehicle has the same exterior without personalized decoration.However, in the monitoring disclosure, the photographing time and thephotographing position of the vehicle image would easily be obtained. Byanalyzing the vehicle transfer interval between the two cameras, theinventors of the present disclosure find that for at least one pair ofcameras, the vehicle transfer interval may be simulated as a randomvariable that satisfies the probability distribution. Due to theGaussian-like and long-tailed properties of the vehicle transferinterval, lognormal distribution may be used to simulate the randomvariable. Given that l represents a camera indicating that the vehicleis leaving, and represents a camera indicating that the vehicle enters,the conditional probability of the vehicle transfer interval τ between land c is computed by means of the following formula 1:

$\begin{matrix}{{p\left( {{\tau ❘l},{e;\mu_{l,e}},\sigma_{l,e}} \right)} = {{\ln\;{\mathcal{N}\left( {{\tau;\mu_{l,e}},\sigma_{l,e}} \right)}} = {\frac{1}{{\tau\sigma}_{l,e}\sqrt{2\pi}}{\exp\left\lbrack {- \frac{\left( {{\ln\;\tau} - \mu_{l,e}} \right)^{2}}{2\sigma_{l,e}^{2}}} \right\rbrack}}}} & {{Formula}\mspace{14mu} 1}\end{matrix}$

wherein μ_(l,c), σ_(l,c) respectively represent estimated parameters ofeach pair of cameras (l,e), and the vehicle transfer interval τ is anabsolute value of the photographing time of two vehicle images, and theestimated parameters may be computed by maximizing the following loglikelihood function:

${L\left( {{\tau ❘l},{e;\mu_{l,e}},\sigma_{l,e}} \right)} = {\prod\limits_{n = 1}^{N}\;{\left( \frac{1}{\tau_{n}} \right){\mathcal{N}\left( {{{\ln\;\tau_{n}};\mu_{l,e}},\sigma_{l,e}} \right)}}}$

wherein τ_(n)∈τ(n=1, 2, 3, . . . , N) represents a vehicle transferinterval between two cameras of each pair of cameras (l, e) sampled fromthe training set, τ including the vehicle transfer interval samplesbetween the two cameras in the training set.

After obtaining the conditional probability of the vehicle transferinterval τ between l and e, the temporal-spatial distance of vehiclesbetween two vehicle images may be computed according to the followingformula 2:

$\begin{matrix}{D_{s} = {1/\left( {1 + e^{\alpha{({{p{({{\tau ❘l},{e❘{\mu_{l,e}\sigma_{l,e}}}})}} - 0.5})}}} \right)}} & {{Formula}\mspace{14mu} 2}\end{matrix}$

wherein the higher the conditional probability is, the smaller thetemporal-spatial distance of vehicles between two vehicle images is.

Finally, the similarity distance between two vehicle images may becomputed according to the following formula 3:

$\begin{matrix}{D = {D_{\alpha} + {\beta\; D_{s}}}} & {{Formula}\mspace{14mu} 3}\end{matrix}$

wherein Da represents the cosine distance of the vehicle appearancefeature vector between two vehicle images, Ds represents thetemporal-spatial distance between the two vehicle images, D representsthe similarity distance of vehicles between the two vehicle images, thesize of α is 2, the size of β is 0.1. A where the smaller the similaritydistance of the vehicles between the two vehicle images is, the moresimilar the vehicles between the two vehicle images is.

When the similarity distance between the target vehicle and the vehiclein the vehicle image to be selected is less than or equal to a secondpreset threshold, it can be determined that the vehicle image to beselected is a target candidate vehicle image matching the targetvehicle. A person skilled in the art may obtain the second presetthreshold by means of tests. Certainly, the embodiments of the presentdisclosure are not limited thereto.

FIG. 9 is a schematic diagram showing a similarity distance of a vehiclefor implementing the method embodiment of FIG. 8. As shown in FIG. 9,the images in the boxes in the first row are top five vehicle images tobe selected obtained according to the cosine distance, and the leftmostimage in the first row is the image of the target vehicle, and a row ofimages in the bottom row is a reordering result obtained based on thetemporal-spatial distances of the vehicle image to be selected and theimage of the target vehicle. According to one or more embodiments of thepresent disclosure, the conditional probability of the vehicle transferinterval is computed according to the photographing time values of thetarget vehicle image and the vehicle image to be selected and the serialnumbers of the photographing cameras of the target vehicle image and thevehicle image to be selected by using formula 1. Then, thetemporal-spatial distance between the target vehicle image and thevehicle image to be selected is computed according to the conditionalprobability of the vehicle transfer interval by using formula 2, andthen the similarity distance of the vehicles between the target vehicleimage and the vehicle image to be selected is computed according to theknown cosine distance and the computed temporal-spatial distance byusing formula 3. Finally, the sorting result of the vehicle image to beselected is reordered according to the similarity distance of thevehicles between the target vehicle image and the vehicle image to beselected, to obtain a reordering result of the to-be-selected vehicleimage.

Exemplary embodiments of the present disclosure are directed to providea method for vehicle retrieval. Obtaining appearance feature data of atarget vehicle in an image to be retrieved by means of the method forvehicle appearance feature recognition provided in Embodiment 1 orEmbodiment 2, and searching the candidate vehicle image library for atarget candidate vehicle image matching the appearance feature data mayimprove the accuracy of the vehicle retrieval.

The method for vehicle retrieval in the embodiments may be executed byany appropriate device having data processing capability, including, butnot limited to, a terminal device, a server and the like. Alternatively,any method provided by the embodiments of the present disclosure isexecuted by a processor, for example, any method mentioned in theembodiments of the present disclosure is executed by the processor byinvoking a corresponding instruction stored in a memory. Details are notdescribed below again.

A person of ordinary skill in the art may understand that all or somesteps for implementing the foregoing method embodiments are achieved bya program by instructing related hardware; the foregoing program can bestored in a computer-readable storage medium; when the program isexecuted, steps including the foregoing method embodiments are executed.Moreover, the foregoing storage medium includes various media capable ofstoring program codes, such as Read-Only Memory (ROM), Random AccessMemory (RAM), a magnetic disk, or an optical disk.

Based on the same technical concept, FIG. 10 is a schematic structuraldiagram of one embodiment of an apparatus for vehicle appearance featurerecognition according to the present disclosure. The apparatus may beused to execute the procedures of the method for vehicle appearancefeature recognition according to Embodiment 1.

Referring to FIG. 10, the apparatus for vehicle appearance featurerecognition includes a first obtaining module 501, an extraction module502, and a fusion module 503.

The first obtaining module 501 is configured to obtain multiple regionsegmentation results of a target vehicle from an image to be recognized.

The extraction module 502 is configured to extract global feature dataand multiple pieces of region feature data from the image to berecognized based on the multiple region segmentation results.

The fusion module 503 is configured to fuse the global feature data andthe multiple pieces of region feature data to obtain appearance featuredata of the target vehicle.

By means of the apparatus for vehicle appearance feature recognitionprovided by the embodiments, multiple region segmentation results of atarget vehicle are obtained from an image to be recognized including thetarget vehicle, then global feature data and multiple pieces of regionfeature data are extracted from the image to be recognized based on themultiple region segmentation results, and the global feature data andthe multiple pieces of region feature data are fused to obtainappearance feature data of the target vehicle. The vehicle appearancefeatures recognized by the embodiments include features of the localregions of the vehicle appearance, so as to describe the appearance ofthe vehicle more accurately. In addition, by means of the vehicleappearance features recognized in the embodiments, the vehicleappearance features in different vehicle images may be compareddirectly, thereby solving the problem that different regions betweendifferent vehicle images cannot be compared.

Based on the same technical concept, FIG. 11 is a schematic structuraldiagram of another embodiment of the apparatus for vehicle appearancefeature recognition according to the present disclosure. The apparatusmay be used to execute the procedures of the method for vehicleappearance feature recognition according to Embodiment 2.

Referring to FIG. 11, the apparatus for vehicle appearance featurerecognition includes a first obtaining module 601, an extraction module602, and a fusion module 603. The first obtaining module 601 isconfigured to obtain multiple region segmentation results of a targetvehicle from an image to be recognized. The extraction module 602 isconfigured to extract global feature data and multiple pieces of regionfeature data from the image to be recognized based on the multipleregion segmentation results. The fusion module 603 is configured to fusethe global feature data and the multiple pieces of region feature datato obtain appearance feature data of the target vehicle.

According to one or more embodiments of the present disclosure, multipleregion segmentation results respectively correspond to regions ofdifferent orientations of the target vehicle.

According to one or more embodiments of the present disclosure, themultiple region segmentation results include segmentation results of afront side, a rear side, a left side, and a right side of the targetvehicle.

According to one or more embodiments of the present disclosure, thefirst obtaining module 601 includes: an obtaining sub-module 6011,configured to obtain multiple region segmentation results of the targetvehicle from the image to be recognized by means of a first neuralnetwork for region extraction.

According to one or more embodiments of the present disclosure, thefirst neural network has a first feature extraction layer and a firstcomputing layer connected to a tail end of the first feature extractionlayer, wherein the obtaining sub-module 6011 is configured to: performfeature extraction on the image to be recognized by means of the firstfeature extraction layer to obtain multiple key points of the targetvehicle; and the multiple key points are classified by means of thefirst computing layer to obtain multiple key point clusters, andrespectively fuse feature maps of key points in the multiple key pointclusters, to obtain region segmentation results corresponding to themultiple key point clusters.

According to one or more embodiments of the present disclosure, theextraction module 602 includes: an extraction sub-module 6021,configured to extract global feature data and multiple pieces of regionfeature data of the target vehicle from the image to be recognized bymeans of a second neural network for feature extraction based on themultiple region segmentation results.

According to one or more embodiments of the present disclosure, thesecond neural network has a first processing subnet and multiple secondprocessing subnets separately connected to an output end of the firstprocessing subnet, wherein the first processing subnet has a secondfeature extraction layer, a first inception module, and a first poolinglayer, and the second processing subnet has a second computing layer, asecond inception module, and a second pooling layer which are connectedto the output end of the first processing subnet.

According to one or more embodiments of the present disclosure, theextraction sub-module 6021 includes: a first feature extraction unit6022, configured to perform a convolution operation and a poolingoperation on the image to be recognized by means of the second featureextraction layer to obtain a global feature map of the target vehicle; asecond feature extraction unit 6023, configured to perform a convolutionoperation and a pooling operation on the global feature map by means ofthe first inception module to obtain a first feature map set of thetarget vehicle; and a first pooling unit 6024, configured to perform apooling operation on feature maps in the first feature map set by meansof the first pooling layer to obtain a global feature vector of thetarget vehicle.

According to one or more embodiments of the present disclosure, theextraction sub-module 6021 further includes: a first computing unit6026, configured to perform point multiplication on the multiple regionsegmentation results and the global feature map separately by means ofthe second computing layer, to obtain local feature maps respectivelycorresponding to the multiple region segmentation results; a thirdfeature extraction unit 6027, configured to perform a convolutionoperation and a pooling operation on the local feature maps of themultiple region segmentation results by means of the second inceptionmodule to obtain a second feature map set corresponding to the multipleregion segmentation results; and a second pooling unit 6028, configuredto perform a pooling operation on the second feature map set of themultiple region segmentation results by means of the second poolinglayer to obtain first region feature vectors corresponding to themultiple region segmentation results.

According to one or more embodiments of the present disclosure, theextraction sub-module 6021 further includes: a second computing unit6025, configured to respectively scale the multiple region segmentationresults to the same size as a size of the global feature map by means ofthe second computing layer.

According to one or more embodiments of the present disclosure, thefusion module 603 includes: a fusion sub-module 6031, configured to fusethe global feature data and the multiple pieces of region feature dataof the target vehicle by means of a third neural network for featurefusion.

According to one or more embodiments of the present disclosure, thethird neural network has a first fully connected layer, a thirdcomputing layer, and a second fully connected layer which are connectedto an output end of the second neural network, wherein the fusionsub-module 6031 includes: a first obtaining unit 6032, configured toobtain weight values of the first region feature vectors by means of thefirst fully connected layer; a third computing unit 6033, configured torespectively weight the multiple first region feature vectors by meansof the third computing layer according to the weight values to obtaincorresponding multiple second region feature vectors; and mapping unit6034, configured to perform a mapping operation on the multiple secondregion feature vectors and the global feature vector by means of thesecond fully connected layer to obtain an appearance feature vector ofthe target vehicle.

According to one or more embodiments of the present disclosure, thefirst obtaining unit 6032 is configured to: perform a stitchingoperation on the multiple first region feature vectors to obtain astitched first region feature vector; perform a mapping operation on thestitched first region feature vector by means of the first fullyconnected layer to obtain a set of scalars corresponding to the multiplefirst region feature vectors; and perform a normalization operation onthe multiple scalars in the set to obtain the weight values of themultiple first region feature vectors.

According to one or more embodiments of the present disclosure, thefirst feature extraction layer is an hourglass network structure.

It should be noted that the specific details further involved in theapparatus for vehicle appearance feature recognition provided by theembodiments of the present disclosure have been described in detail inthe method for vehicle appearance feature recognition provided by theembodiments of the present disclosure, and are not described hereinagain.

Based on the same technical concept, FIG. 12 is a schematic structuraldiagram of one embodiment of an apparatus for vehicle retrievalaccording to the present disclosure. The apparatus may be used toexecute the procedures of the method for vehicle retrieval according toEmbodiment 3.

Referring to FIG. 12, the apparatus for vehicle retrieval includes asecond obtaining module 701 and a searching module 702.

The second obtaining module 701 is configured to obtain appearancefeature data of a target vehicle in an image to be retrieved by means ofthe apparatus according to Embodiment 5 or Embodiment 6.

The searching module 702 is configured to search a candidate vehicleimage library for a target candidate vehicle image matching theappearance feature data.

Exemplary embodiments of the present disclosure are directed to providean apparatus for vehicle retrieval. Obtaining appearance feature data ofa target vehicle in an image to be retrieved by means of the apparatusfor vehicle appearance feature recognition provided in Embodiment 5 orEmbodiment 6, and searching the candidate vehicle image library for atarget candidate vehicle image matching the appearance feature data mayimprove the accuracy of the vehicle retrieval.

Based on the same technical concept, FIG. 13 is a schematic structuraldiagram of another embodiment of the apparatus for vehicle retrievalaccording to the present disclosure. The apparatus may be used toexecute the procedures of the method for vehicle retrieval according toEmbodiment 4.

Referring to FIG. 13, the apparatus for vehicle retrieval includes asecond obtaining module 804 and a searching module 805. The secondobtaining module 804 is configured to obtain appearance feature data ofa target vehicle in an image to be retrieved by means of the apparatusaccording to Embodiment 5 or Embodiment 6. The searching module 805 isconfigured to search a candidate vehicle image library for a targetcandidate vehicle image matching the appearance feature data.

According to one or more embodiments of the present disclosure, thesearching module 805 is configured to: determine cosine distancesbetween the appearance feature vector of the target vehicle andappearance feature vectors of vehicles in vehicle images to be selectedin the candidate vehicle image library, separately; and determine,according to the cosine distances, a target candidate vehicle imagematching the target vehicle.

According to one or more embodiments of the present disclosure, theapparatus in the embodiments further includes: a third obtaining module801, configured to obtain the photographed time and/or a photographingposition of the image to be retrieved and the photographed time and/orphotographing positions of the multiple vehicle images to be selected; afirst determining module 802, configured to determine temporal-spatialdistances between the target vehicle and vehicles in the multiplevehicle images to be selected according to the photographed time and/orthe photographing positions; and a second determining module 803,configured to determine, according to the cosine distances and thetemporal-spatial distances, a target candidate vehicle image matchingthe target vehicle, in the candidate vehicle image library.

According to one or more embodiments of the present disclosure, thesecond determining module 803 is configured to: obtain the multiplevehicle images to be selected from the candidate vehicle image libraryaccording to the cosine distances; determine a temporal-spatial matchingprobability of the vehicle image to be selected and the target vehiclebased on the photographed time and the photographing position of thevehicle image to be selected, respectively; and determine, according tothe cosine distances and the temporal-spatial matching probability, atarget candidate vehicle image matching the target vehicle.

It should be noted that the specific details further involved in theapparatus for vehicle retrieval provided by the embodiments of thepresent disclosure have been described in detail in the method forvehicle retrieval provided by the embodiments of the present disclosure,and are not described herein again.

The embodiments of the present disclosure further provide an electronicdevice which, for example, may be a mobile terminal, a PC, a tabletcomputer, a server, and the like. Referring to FIG. 14 below, aschematic structural diagram of one embodiment of an electronic device900, which may be a terminal device or a server, suitable forimplementing the embodiments of the present disclosure is shown. Asshown in FIG. 14, the electronic device 900 includes one or more firstprocessors, a first communication element, and the like. The one or morefirst processors are, for example, one or more Central Processing Units(CPUs) 901 and/or one or more Graphic Processing Units (GPUs) 913 andthe like, and may execute appropriate actions and processing accordingto executable instructions stored in a Read-Only Memory (ROM) 902 orexecutable instructions loaded from a storage section 908 to a RandomAccess Memory (RAM) 903. In some embodiments, the ROM 902 and the RAM903 are collectively called a first memory. The first communicationelement includes a communication component 912 and/or a communicationinterface 909. The communication component 912 may include, but is notlimited to, a network card. The network card may include, but is notlimited to, an InfiniBand (IB) network card. The communication interface909 includes a communication interface of a network interface card suchas an LAN card and a modem, and the communication interface 909 performscommunication processing via a network such as the Internet.

The first processor may be in communication with the ROM 902 and/or theRAM 903 to execute the executable instruction, is connected to thecommunication component 912 by means of the first communication bus 904,and is in communication with other target devices by means of thecommunication component 912, so as to complete operations correspondingto any method for vehicle appearance feature recognition provided bysome embodiments of the present disclosure. For example, multiple regionsegmentation results of a target vehicle are obtained from an image tobe recognized, global feature data and multiple pieces of region featuredata are extracted from the image to be recognized based on the multipleregion segmentation results, and the global feature data and themultiple pieces of region feature data are fused to obtain appearancefeature data of the target vehicle.

In addition, the RAM 903 may further store various programs and datarequired for operations of an apparatus. The CPU 901 or GPU 913, the ROM902, and the RAM 903 are connected to each other by means of the firstcommunication bus 904. In the presence of the RAM 903, the ROM 902 is anoptional module. The RAM 903 stores executable instructions, or writesthe executable instructions to the ROM 902 during running, wherein theexecutable instructions enable the first processor to performcorresponding operations of the foregoing communication method. AnInput/output (I/O) interface 905 is also connected to the firstcommunication bus 904. The communication component 912 may beintegrated, and may be configured to have a plurality of sub-modules(for example, a plurality of IB network cards) linked on thecommunication bus.

The following components are connected to the I/O interface 905: aninput section 906 including a keyboard, a mouse and the like; an outputsection 907 including a Cathode-Ray Tube (CRT), a Liquid Crystal Display(LCD), a loudspeaker and the like; a storage section 908 includinghardware and the like; and the communication interface 909 of a networkinterface card including an LAN card, a modem and the like. A drive 910is also connected to the I/O interface 905 according to requirements. Aremovable medium 911 such as a magnetic disk, an optical disk, amagneto-optical disk, a semiconductor memory or the like is mounted onthe drive 910 according to requirements, so that a computer program readfrom the removable medium is installed on the storage section 908according to requirements.

It should be noted that the architecture illustrated in FIG. 14 ismerely an optional implementation. During optional practice, the numberand types of the components in FIG. 14 may be selected, decreased,increased, or replaced according to actual requirements. Differentfunctional components may be separated or integrated or the like. Forexample, the GPU and the CPU may be separated, or the GPU may beintegrated on the CPU, and the communication element may be separatedfrom or integrated on the CPU or the GPU or the like. These alternativeimplementations all fall within the scope of protection of the presentdisclosure.

Particularly, the process described above with reference to theflowchart according to the embodiments of the present disclosure may beimplemented as a computer software program. For example, the embodimentsof the present disclosure include a computer program product, whichincludes a computer program tangibly contained in a machine-readablemedium. The computer program includes a program code for executing amethod illustrated in the flowchart. The program code may includecorresponding instructions for correspondingly executing the steps ofthe methods provided by the embodiments of the present disclosure. Forexample, multiple region segmentation results of a target vehicle areobtained from an image to be recognized, global feature data andmultiple pieces of region feature data are extracted from the image tobe recognized based on the multiple region segmentation results, and theglobal feature data and the multiple pieces of region feature data arefused to obtain appearance feature data of the target vehicle. In suchembodiments, the computer program may be downloaded from a network bymeans of the communication element and installed, and/or be installedfrom the removable medium 911. When the computer program is executed bythe first processor, functions provided in the method according to theembodiments of the present disclosure are executed.

The embodiments of the present disclosure further provide an electronicdevice which, for example, may be a mobile terminal, a PC, a tabletcomputer, a server, and the like. Referring to FIG. 15 below, aschematic structural diagram of another embodiment of an electronicdevice 1000, which may be a terminal device or a server, suitable forimplementing the embodiments of the present disclosure is shown. Asshown in FIG. 15, the electronic device 1000 includes one or more secondprocessors, a second communication element, and the like. The one ormore second processors are, for example, one or more CPUs 1001 and/orone or more GPUs 1013 and the like, and may execute appropriate actionsand processing according to executable instructions stored in an ROM1002 or executable instructions loaded from a storage section 1008 to anRAM 1003. In the embodiments, the second ROM 1002 and the RAM 1003 arecollectively called a second memory. The second communication elementincludes a communication component 1012 and/or a communication interface1009. The communication component 1012 may include, but is not limitedto, a network card. The network card may include, but is not limited to,an IB network card. The communication interface 1009 includes acommunication interface of a network interface card such as an LAN cardand a modem, and the communication interface 1009 performs communicationprocessing via a network such as the Internet.

The second processor may be in communication with the ROM 1002 and/orthe RAM 1003 to execute the executable instruction, is connected to thecommunication component 1012 by means of the second communication bus1004, and is in communication with other target devices by means of thecommunication component 1012, so as to complete operations correspondingto any method for vehicle retrieval provided by the embodiments of thepresent disclosure. For example, appearance feature data of the targetimage in the image to be retrieved is obtained by means of the methodaccording to Embodiment 1 or Embodiment 2, and a candidate vehicle imagelibrary is searched for the target candidate vehicle image matching theappearance feature data.

In addition, the RAM 1003 may further store various programs and datarequired for operations of an apparatus. The CPU 1001 or GPU 1013, theROM 1002, and the RAM 1003 are connected to each other by means of thesecond communication bus 1004. In the presence of the RAM 1003, the ROM1002 is an optional module. The RAM 1003 stores executable instructions,or writes the executable instructions to the ROM 1002 during running,wherein the executable instructions enable the second processor toperform corresponding operations of the foregoing communication method.An I/O interface 1005 is also connected to the second communication bus1004. The communication component 1012 may be integrated, and may alsobe configured to have a plurality of sub-modules (for example, aplurality of IB network cards) linked on the communication bus.

The following components are connected to the I/O interface 1005: aninput section 1006 including a keyboard, a mouse and the like; an outputsection 1007 including a Cathode-Ray Tube (CRT), a Liquid CrystalDisplay (LCD), a loudspeaker and the like; a storage section 1008including hardware and the like; and the communication interface 1009 ofa network interface card including an LAN card, a modem and the like. Adrive 1010 is also connected to the I/O interface 1005 according torequirements. A removable medium 1011 such as a magnetic disk, anoptical disk, a magneto-optical disk, a semiconductor memory or the likeis mounted on the drive 1010 according to requirements, so that acomputer program read from the removable medium is installed on thestorage section 1008 according to requirements.

It should be noted that the architecture illustrated in FIG. 15 ismerely an optional implementation. During optional practice, the numberand types of the components in FIG. 15 may be selected, decreased,increased, or replaced according to actual requirements. Differentfunctional components may be separated or integrated or the like. Forexample, the GPU and the CPU may be separated, or the GPU may beintegrated on the CPU, and the communication element may be separatedfrom or integrated on the CPU or the GPU or the like. These alternativeimplementations all fall within the scope of protection of the presentdisclosure.

Particularly, the process described above with reference to theflowchart according to the embodiments of the present disclosure may beimplemented as a computer software program. For example, the embodimentsof the present disclosure include a computer program product, whichincludes a computer program tangibly contained in a machine-readablemedium. The computer program includes a program code for executing amethod illustrated in the flowchart. The program code may includecorresponding instructions for correspondingly executing the steps ofthe methods provided by the embodiments of the present disclosure. Forexample, appearance feature data of the target image in the image to beretrieved is obtained by means of the method according to Embodiment 1or Embodiment 2, and a candidate vehicle image library is searched forthe target candidate vehicle image matching the appearance feature data.In such embodiments, the computer program may be downloaded from anetwork by means of the communication element and installed, and/or beinstalled from the removable medium 1011. When the computer program isexecuted by the second processor, functions provided in the methodaccording to the embodiments of the present disclosure are executed.

It should be noted that according to needs for implementation, thecomponents/steps described in the present disclosure are separated intomore components/steps, and two or more components/steps or someoperations of the components/steps are also combined into newcomponents/steps.

The embodiments in the specification are all described in a progressivemanner, for same or similar parts in the embodiments, refer to theseembodiments, and each embodiment focuses on a difference from otherembodiments. The system embodiments correspond to the method embodimentssubstantially and therefore are only described briefly, and for theassociated part, refer to the descriptions of the method embodiments.

The methods and apparatuses in the present disclosure may be implementedin many manners. For example, the methods and apparatuses in the presentdisclosure may be implemented with software, hardware, firmware, or anycombination of software, hardware, and firmware. The foregoing specificsequence of steps of the method is merely for description, and unlessotherwise stated particularly, is not intended to limit the steps of themethod in the present disclosure. In addition, in some embodiments, thepresent disclosure is also implemented as programs recorded in arecording medium. The programs include machine-readable instructions forimplementing the methods according to the present disclosure. Therefore,the present disclosure further covers the recording medium storing theprograms for performing the methods according to the present disclosure.

The descriptions of the present disclosure are provided for the purposeof examples and description, and are not intended to be exhaustive orlimit the present disclosure to the disclosed form. Many modificationsand changes are obvious to a person of ordinary skill in the art. Theembodiments are selected and described to better describe a principleand an actual disclosure of the present disclosure, and to make a personof ordinary skill in the art understand the present disclosure, so as todesign various embodiments with various modifications applicable toparticular use.

1. A method for vehicle appearance feature recognition, comprising:obtaining a plurality of region segmentation results of a target vehiclefrom an image to be recognized; extracting global feature data and aplurality of pieces of region feature data from the image to berecognized based on the plurality of region segmentation results; andfusing the global feature data and the plurality of pieces of regionfeature data to obtain appearance feature data of the target vehicle,wherein the extracting global feature data and a plurality of pieces ofregion feature data from the image to be recognized based on theplurality of region segmentation results comprises: extracting globalfeature data and the plurality of pieces of region feature data of thetarget vehicle from the image to be recognized by means of a secondneural network for feature extraction based on the plurality of regionsegmentation results, which comprises: performing point multiplicationon the plurality of region segmentation results and a global feature mapseparately by means of a second computing layer, to obtain local featuremaps respectively corresponding to the plurality of region segmentationresults; performing a convolution operation and a pooling operation onthe local feature maps of the plurality of region segmentation resultsby means of a second inception module to obtain a second feature map setcorresponding to the plurality of region segmentation results; andperforming a pooling operation on the second feature map setcorresponding to the plurality of region segmentation results by meansof a second pooling layer to obtain first region feature vectorscorresponding to the plurality of region segmentation results, whereinbefore the performing point multiplication on the plurality of regionsegmentation results and a global feature map separately by means of asecond computing layer, the method further comprises: respectivelyscaling the plurality of region segmentation results to the same size asa size the global feature map by means of the second computing layer. 2.The method according to claim 1, wherein the plurality of regionsegmentation results respectively corresponds to regions of differentorientations of the target vehicle, wherein the plurality of regionsegmentation results comprise segmentation results of a front side, arear side, a left side, and a right side of the target vehicle.
 3. Themethod according to claim 1, wherein the obtaining a plurality of regionsegmentation results of a target vehicle from an image to be recognizedcomprises: obtaining the plurality of region segmentation results of thetarget vehicle from the image to be recognized by means of a firstneural network for region extraction.
 4. The method according to claim3, wherein the first neural network has a first feature extraction layerand a first computing layer connected to a tail end of the first featureextraction layer, wherein the obtaining a plurality of regionsegmentation results of the target vehicle from the image to berecognized by means of a first neural network for region extractioncomprises: performing feature extraction on the image to be recognizedby means of the first feature extraction layer to obtain a plurality ofkey points of the target vehicle; and classifying the plurality of keypoints by means of the first computing layer to obtain a plurality ofkey point clusters, and respectively fusing feature maps of key pointsin the plurality of key point clusters, to obtain region segmentationresults corresponding to the plurality of key point clusters.
 5. Themethod according to claim 1, wherein the second neural network has afirst processing subnet and a plurality of second processing subnetsseparately connected to an output end of the first processing subnet,wherein the first processing subnet has a second feature extractionlayer, a first inception module, and a first pooling layer, and thesecond processing subnet has a second computing layer, a secondinception module, and a second pooling layer which are connected to theoutput end of the first processing subnet.
 6. The method according toclaim 5, wherein the extracting global feature data and the plurality ofpieces of region feature data of the target vehicle from the image to berecognized by means of a second neural network for feature extractionbased on the plurality of region segmentation results further comprises:performing a convolution operation and a pooling operation on the imageto be recognized by means of the second feature extraction layer toobtain a global feature map of the target vehicle; performing aconvolution operation and a pooling operation on the global feature mapby means of the first inception module to obtain a first feature map setof the target vehicle; and performing a pooling operation on featuremaps in the first feature map set by means of the first pooling layer toobtain a global feature vector of the target vehicle.
 7. The methodaccording to claim 1, wherein the fusing the global feature data and theplurality of pieces of region feature data comprises: fusing the globalfeature data and the plurality of pieces of region feature data of thetarget vehicle by means of a third neural network for feature fusion. 8.The method according to claim 7, wherein the third neural network has afirst fully connected layer, a third computing layer, and a second fullyconnected layer which are connected to an output end of the secondneural network, wherein the fusing the global feature data and theplurality of pieces of region feature data of the target vehicle bymeans of a third neural network for feature fusion comprises: obtainingweight values of the first region feature vectors by means of the firstfully connected layer; respectively weighting the first region featurevectors by means of the third computing layer according to the weightvalues to obtain corresponding a plurality of second region featurevectors; and performing a mapping operation on the plurality of secondregion feature vectors and a global feature vector by means of thesecond fully connected layer to obtain an appearance feature vector ofthe target vehicle.
 9. The method according to claim 8, wherein theobtaining weight values of the first region feature vectors by means ofthe first fully connected layer comprises: performing a stitchingoperation on the first region feature vectors to obtain a stitched firstregion feature vector; performing a mapping operation on the stitchedfirst region feature vector by means of the first fully connected layerto obtain a set of scalars corresponding to the first region featurevectors; and performing a normalization operation on scalars in the setof scalars to obtain the weight values of the first region featurevectors.
 10. The method according to claim 4, wherein the first featureextraction layer is an hourglass network structure.
 11. A method forvehicle retrieval, comprising: obtaining appearance feature data of atarget vehicle in an image to be retrieved by means of the methodaccording to claim 1; and searching a candidate vehicle image libraryfor a target candidate vehicle image matching the appearance featuredata.
 12. The method according to claim 11, wherein the searching acandidate vehicle image library for a target candidate vehicle imagematching the appearance feature data comprises: determining cosinedistances between an appearance feature vector of the target vehicle andappearance feature vectors of vehicles in a plurality of vehicle imagesto be selected in the candidate vehicle image library, separately; anddetermining, according to the cosine distances, a target candidatevehicle image matching the target vehicle.
 13. The method according toclaim 12, further comprising: obtaining at least one of a photographedtime or a photographing position of the image to be retrieved and atleast one of a photographed time or photographing positions of theplurality of vehicle images to be selected; determining temporal-spatialdistances between the target vehicle and vehicles in the plurality ofvehicle images to be selected according to the at least one of thephotographed time or the photographing position of the image to beretrieved and the at least one of the photographed time or thephotographing positions of the plurality of vehicle images to beselected; and determining, according to the cosine distances and thetemporal-spatial distances, a target candidate vehicle image matchingthe target vehicle, in the candidate vehicle image library.
 14. Themethod according to claim 13, wherein the determining, according to thecosine distances and the temporal-spatial distances, a target candidatevehicle image matching the target vehicle, in the candidate vehicleimage library comprises: obtaining the plurality of vehicle images to beselected from the candidate vehicle image library according to thecosine distances; determining a temporal-spatial matching probability ofthe vehicle image to be selected and the target vehicle based on thephotographed time and the photographing position of the vehicle image tobe selected, respectively; and determining, according to the cosinedistances and the temporal-spatial matching probability, a targetcandidate vehicle image matching the target vehicle.
 15. An apparatusfor vehicle appearance feature recognition, comprising: a memory storingprocessor-executable instructions; and a processor arranged to executethe processor-executable instructions to perform steps of: obtaining aplurality of region segmentation results of a target vehicle from animage to be recognized; extracting global feature data and a pluralityof pieces of region feature data from the image to be recognized basedon the plurality of region segmentation results; and fusing the globalfeature data and the plurality of pieces of region feature data toobtain appearance feature data of the target vehicle, wherein theextracting global feature data and a plurality of pieces of regionfeature data from the image to be recognized based on the plurality ofregion segmentation results comprises: extracting global feature dataand the plurality of pieces of region feature data of the target vehiclefrom the image to be recognized by means of a second neural network forfeature extraction based on the plurality of region segmentationresults, which comprises: performing point multiplication on theplurality of region segmentation results and a global feature mapseparately by means of a second computing layer, to obtain local featuremaps respectively corresponding to the plurality of region segmentationresults; performing a convolution operation and a pooling operation onthe local feature maps of the plurality of region segmentation resultsby means of a second inception module to obtain a second feature map setcorresponding to the plurality of region segmentation results; andperforming a pooling operation on the second feature map setcorresponding to the plurality of region segmentation results by meansof a second pooling layer to obtain first region feature vectorscorresponding to the plurality of region segmentation results, whereinbefore the performing point multiplication on the plurality of regionsegmentation results and a global feature map separately by means of asecond computing layer, the steps further comprises: respectivelyscaling the plurality of region segmentation results to the same size asa size the global feature map by means of the second computing layer.16. The apparatus according to claim 15, wherein the plurality of regionsegmentation results respectively corresponds to regions of differentorientations of the target vehicle, wherein the plurality of regionsegmentation results comprise segmentation results of a front side, arear side, a left side, and a right side of the target vehicle.
 17. Theapparatus according to claim 15, wherein the obtaining a plurality ofregion segmentation results of a target vehicle from an image to berecognized comprises: obtaining the plurality of region segmentationresults of the target vehicle from the image to be recognized by meansof a first neural network for region extraction.
 18. The apparatusaccording to claim 17, wherein the first neural network has a firstfeature extraction layer and a first computing layer connected to a tailend of the first feature extraction layer, wherein the obtaining aplurality of region segmentation results of the target vehicle from theimage to be recognized by means of a first neural network for regionextraction comprises: performing feature extraction on the image to berecognized by means of the first feature extraction layer to obtain aplurality of key points of the target vehicle; and classifying theplurality of key points by means of the first computing layer to obtaina plurality of key point clusters, and respectively fusing feature mapsof key points in the plurality of key point clusters, to obtain regionsegmentation results corresponding to the plurality of key pointclusters.
 19. An apparatus for vehicle retrieval, comprising: a memorystoring processor-executable instructions; and a processor arranged toexecute the processor-executable instructions to perform steps of:obtaining appearance feature data of a target vehicle in an image to beretrieved by means of the method according to claim 1; and searching acandidate vehicle image library for a target candidate vehicle imagematching the appearance feature data.
 20. A non-transitory computerreadable storage medium having stored thereon computer programinstructions that, when executed by a processor, cause the processor toimplement steps of a method for vehicle appearance feature recognition,the method comprising: obtaining a plurality of region segmentationresults of a target vehicle from an image to be recognized; extractingglobal feature data and a plurality of pieces of region feature datafrom the image to be recognized based on the plurality of regionsegmentation results; and fusing the global feature data and theplurality of pieces of region feature data to obtain appearance featuredata of the target vehicle, wherein the extracting global feature dataand a plurality of pieces of region feature data from the image to berecognized based on the plurality of region segmentation resultscomprises: extracting global feature data and the plurality of pieces ofregion feature data of the target vehicle from the image to berecognized by means of a second neural network for feature extractionbased on the plurality of region segmentation results, which comprises:performing point multiplication on the plurality of region segmentationresults and a global feature map separately by means of a secondcomputing layer, to obtain local feature maps respectively correspondingto the plurality of region segmentation results; performing aconvolution operation and a pooling operation on the local feature mapsof the plurality of region segmentation results by means of a secondinception module to obtain a second feature map set corresponding to theplurality of region segmentation results; and performing a poolingoperation on the second feature map set corresponding to the pluralityof region segmentation results by means of a second pooling layer toobtain first region feature vectors corresponding to the plurality ofregion segmentation results, wherein before the performing pointmultiplication on the plurality of region segmentation results and aglobal feature map separately by means of a second computing layer, themethod further comprises: respectively scaling the plurality of regionsegmentation results to the same size as a size the global feature mapby means of the second computing layer.